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Abstract: 

The first experimental data from single-particle scattering experiments from 
\^ ' free electron lasers (FELs) are now becoming available. The first such 

P\J , experiments are being performed on relatively large objects such as viruses, 

which produce relatively low-resolution, low-noise diffraction patterns in 
so-called "diffract-and-destroy" experiments. We describe a very simple test 
on the angular correlations of measured diffraction data to determine if the 
PQ , scattering is from an icosahedral particle. If this is confirmed, the efficient 

Q ' algorithm proposed can then combine diffraction data from multiple shots 

of particles in random unknown orientations to generate a full 3D image 
of the icosahedral particle. We demonstrate this with a simulation for the 
O^, satellite tobacco necrosis virus (STNV), the atomic coordinates of whose 

asymmetric unit is given in Protein Data Bank entry 2BUK. 
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1. Introduction 

The free electron lasers (FELs) now beginning to come online produce radiation many orders 
of magnitude brighter than than any existing source, and enable experiments previously the 
domain only of science fiction. One such proposed experiment 11] envisages reconstructing the 
3D structure of a microscopic entity such as a virus from many ultrashort diffraction patterns 
of many identical copies of the particles in random orientations from single pulses of FEL 
radiation. Although the particles will undoubtedly suffer catastrophic radiation damage, the 
ultrashort nature of FEL radiation is expected to produce diffraction patterns of the particles 
before significant disintegration. An experiment on individual mimivirus particles was reported 
recently Q. The paper illustrates convincing diffraction patterns of the virus particle in two 
different orientations, from which 2D projections of the particles are reconstructed using an 
iterative phasing algorithm. Although such particles are known to be largely icosahedral, little 
evidence of the icosahedral shape is evident in the reconstructed projections. Several algorithms 
have been proposed for reconstructing a full 3D image of the particle from an ensemble of many 
such diffraction patterns from randomly oriented particles. The methodology followed by some 
of these approaches fsHJlEI is to find the likely orientation of the measured diffraction patterns 
in the 3D reciprocal space of the particle. 



Another approach Q dispenses with finding the likely orientations of the individual diffrac- 
tion patterns by integrating over orientations, in an attempt to find the spherical harmonic rep- 
resentation of the 3D diffraction volume of a single particle from the averages of the angular 
correlations of the intensities on the measured diffraction patterns. This method of analysis is 
even applicable to individual diffraction patterns from multiple identical particles Q. The par- 
ticles need to be frozen in space or time while the scattering is taking place. If the scattering 
is from a single FEL pulse of radiation, the particles will be essentially frozen in time for the 
duration of the scattering even if not frozen in space. This opens this method to the analysis 
of scattering from particles in random orientations within a droplet. With such an approach, 
the "hit rate" in an experiment with a FEL can become 100%, whereas a low hit rate is to be 
expected when attempting to hit submicron particles with a submicron pulsed laser beam. The 
signal-to-noise ratio from such snapshot patterns is independent of the number of particles per 
shot, but increases with the square root of the number of shots [8 ]. This approach also has the 
advantage that it operates on a compressed version of the voluminous data produced by a FEL. 

We point out here another advantage of this approach: it is easily amenable to simplifications 
resulting from any known point-group symmetry of the particles under study. This is a powerful 
advantage for the study of virus structure, which is dominated by that of its protein coat which 
encloses the genetic material, DNA or RNA, which contain the instructions for the replication 
of the virus. To quote from Caspar and Klug IQ\ "there are only a limited number of efficient 
designs possible for a biological container which can be constructed from a large number of 
identical protein molecules. The two basic designs are helical tubes and icosahedral shells". 
Viruses have regular shapes since they are formed by the self assembly of identical protein 
subunits which are coded by the limited quantity of genetic material capable of being stored 
within the small volume enclosed by its protein coat. An icosahedron, for example can be 
formed by the self assembly of at least 60 identical subunits. The genetic material needs to 
code for just one of these subunits, a factor of at least 60 smaller than the entire structure. 

2. Icosahedral Harmonics 

The first aim of this approach is to find the spherical harmonic representation of the intensity 
distribution of any resolution shell in the reciprocal space of a single particle. Any prior infor- 
mation about the nature of this distribution may be incorporated by limiting the set of spherical 
harmonics over which the summation is performed and by any relationship amongst the ampli- 
tudes of the different spherical harmonics which are a consequence of any known point-group 
symmetry. 

An obvious restriction of the form of the intensity distribution 

I{q,e,$)^Y.^Uq)Yrie,^) (1) 

Im 

is its known inversion (or Friedel) symmetry. Since 

Yrin-e,-7z + (^) = {-l)'Yr{e,c^) (2) 

it follows that a spherical harmonic expansion of an intensity distribution may contain only even 
values of the angular momentum quantum number I. The fact that the intensity distribution 
is real, allows the restriction to a summation over just the so-called real spherical harmonics 
(RSHs) SJ'{9,(j)) defined by the combinations of spherical harmonics: 

^[y/"(0,.^) + (-i)"V"(0,'/')] m>o 

Sr{e,(^) = { 1-0(0, 0) m = (3) 

^[y/"(0,</.)-(-l)'"V«(0,(/))] m<0 



where the set of RSH's with m > form a set, whose dependence is of the form cos {m^ ) , and 
the set with m < hkewise a set with (j) dependence of the form sin {nKJ)). If the reconstructed 
intensity distribution has a mirror plane, this may be chosen to be the x~z plane, or the plane 
for which 0=0. Then ([T]i may be replaced by a summation over only the subset of RSHs for 
which m > 0, and we may take 



7(^,0,.^)= ^ RUq)STid,'l>)- 

l.m>0 



(4) 



Since both the right hand side (RHS) and the left hand side (LHS) of the above equation are 
real, the coefficients Rim{q) may also be taken as real. 
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Fig. 1. Visualization of real spherical harmonics (RSHs) of angular momentum quantum 
numbers / = 0,1, and 2. The plots are made with MATLAB by an adaptation of software 
by Denise L. Chan (avilable from the Mathworks web site). 

The 3D polar plots of Fig. [T] display the familiar forms of the RSHs for the values of / = 
0, 1 ,2. Further point group symmetries of I{q, 9, (j)) result in still further restrictions on allowed 
terms of the general expansion ([U above. When the 3D intensity distribution has icosahedral 
symmetry, for example, ([U may be replaced by 



I{q,e,(p)^Y.g,{q)J,iej), 
I 



(5) 



where the quantities Ji{6,(j)) are known as icosahedral harmonics (IHs), specified up to and 
including / =30 by only the angular momentum quantum number I. Since the orientation of our 
reconstructed 3D intensity distribution in the frame of reference of the particle may be chosen 
arbitrarily, the x~ z plane may be chosen to be the mirror plane, allowing the IHs (|5]l to be 
constructed from just the RSHs of positive m, i.e. 



J,{e,(^)= Y,ai,„sT{e,^) 



(6) 



n>0 



where the coefficients a/,,, are the real numbers for normalized RSHs tabulated by e.g. Jack 
and Harrison (1975) ifTOl . the ones for the lowest allowed even values of / being reproduced in 
Table [U Since the IHs involve a sum over the magnetic quantum number, at least up to / = 30, 
they depend on the quantum number / only. The forms of the icosahedral harmonics of lowest 
even degree, /=0,6, 10,12, and 16 are illustrated in Fig.|2]using the same 3D polarplots. 
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Fig. 2. Icosahedral harmonics of angular momentum quantum number / = 0,6,10,12 and 
16. Each is a linear combination of the RSH's of the magnetic quantum numbers indicated. 
Visualization by the same software as Fig.[T] 

Note that since the RSH's S'i"{9,(p) are orthonormal with respect to integrations over spheri- 
cal shell, the icosahedral harmonics Ji(0,(p) will also be orthonormal with respect to the same 
integration provided 

L«L = i, vz. (7) 



This condition is clearly satisfied by the coefficients in Table 1 . 

3. Reconstructing the Diffraction Volume 

The average angular correlations amongst the resolution rings of the different measured diffrac- 
tion patterns contain information about the 3D diffraction volume of a single particle. Such 
angular correlations are defined by 



N-l 



C2{q,q'M) = — ^ Y. h{'lA«)Ip{q'An+^^) 



N, 



P p «=o 

2 N-\ 



-jtJImJI tm{q)Im{q')*exp{-im(p„) 

P p m=0 



(8) 



where Ip is the intensity on diffraction pattern p, Np is the number of available diffraction 
patterns from random orientations of the particle, 0„ is the n-th of A^ discrete values of 0, and 

hi{q)= Y^ip{qAn)e.^v{"n(^n) (9) 

;i=0 

the angular Fourier tranform of each resolution ring of the p-th diffraction pattern. Indeed the 
fastest way to calculate the average angular correlation C2(^,^',A^) is to exploit the cross- 
correlation theorem by performing the angular Fourier transform (|9|l of each individual diffrac- 
tion pattern, take the product of the Fourier transform and its complex conjugate followed by 
the inverse transform, and to average the results over the diffraction patterns ((|9]l plus the second 
equaUty of ^). 

It has been shown Q that if the data from enough diffraction patterns of randomly oriented 
identical particles are averaged, 

C2{q,q'A<^) = Y.F,{q,q' M)Bl{q,q') dO) 

where 

Fi(q,q',A(j)) = -—Pi\cose(q)cose(q') + sme(q)sme(q')cos(A(p)] (11) 

where P/ is a Legendre polynomial of order /, 

e{q)^7t/2-sm-\q/2K), (12) 

K is the wavenumber of the incident beam, and 

Bi{q,q') =Y^lUq)I,,n{q') (13) 

m 

Since in Eq.lfTOli. the LHS may be found from experiment, and Fi{q,q\A(l)) is a known math- 
ematical function, Bi{q,q') may be found by solving this equation. Due to its form, Bi{q,q') 
contains information about the 3D diffraction volume of the particle via the spherical harmonic 
expansion coefficients Iim{q)- Because the RSHs are related to the regular spherical harmonics 
by a unitary transformation, one may also write 

B,{qA')=Y.^Un{q)Ri,n{q')- (14) 

m 

This is a more convenient form since all quantities in this equation are real. If the Rim{q) coef- 
ficients may be found from deduced values of B/ {q, q'), the expression ^ is just as convenient 
for reconstructing the 3D diffraction volume as ([T]). 

However, finding the correct 7?/„,(^)'s from known Bi{q,q'ys is still a formidable task since 
it involves taking a matrix square root Q. Such a square root is necessarily ambiguous by 
an orthogonal matrix which cannot be found from the Bi{q,q'ys alone. In principle, such an 
orthogonal matrix may be found from the so-called angular triple correlations ifTll or by an iter- 
ative phasing algorithm that alternately satisfies constraints to the meaured angular correlations 
and in the 3D space of the reconstructed intensity disrtibution ifTSl . 

When the particle under study is known to have a high degree of symmetry, like an icosahe- 
dral virus, this problem is greatly simplified. Comparing (01 and (|5]l with definition ^, one can 
deduce that, for a diffraction volume with icosahedral symmetry, 

Rimiq) ^ gi{q)aim (15) 



Substituting (flSl l into (fT4T) we see that one may write 

Bi{qA')^gi{q)gi{q')Y.^L (16) 

m 

and using (|7]l this may be simplified further to 

Bl{q,q')^gl{q)gl{q') (17) 

We see here the great advantage of using IH's rather than RSHs for this problem of icosa- 
hedral symmetry. The sum over m in the RHS of ( fT4l i has disappeared completely in the RHS 
of ( [TtI i! The RHS of this equation is just the product of two scalars. A diffraction volume of 
icosahedral symmetry may be reconstructed via Q if the coefficients gi {q) are known. Since 
the other quantities in (fTsT i are real, it is clear that gi{q) may be chosen to be real. The magni- 
tudes of the gi [q] coefficients may be found from the diagonal quantities Bi{q,q) deduced from 
the intensity autocorrelations on resolution ring q via 



\g,{q)\ = VBii^ (18) 

Thus, the only remaining task in determining the coefficients gi {q) is determing the signs of 
these real numbers. A simple way is to notice that the expression (|5]l for the intensity of a 
resolution shell of radius q in the 3D diffraction volume may be rewritten 

I{q,e,(^)=Y,\gl{q)\sign[g,{q)]J,{e,C^) (19) 

/ 

where the only unknown quantities in the RHS are the signs of gi{q). Since the only permitted 
values of the quantum number / of the icosahedral harmonic coefficients gi (q) of a diffraction 
volume are the even permitted values up to 1=30, namely /=0, 6, 10, 12, 16, 18, 20, 22, 24, 26, 
28, and 30, we attempted to determine these signs by an exhaustive search over the 2'^ ~ 4000 
combinations of signs by finding the combination that minimized 

L|/-(?, 0,0)1 (20) 

e.ij) 

where /_ are the negative values of/, for a chosen resolutiuon shell q. The physical basis of this 
is simply that l{q,0,(p) has to be a positive definite quantity, and our best approximation to this 
is a function with a minimum sum of the magnitudes of negative values. As subsequent results 
show, this easily implemented prescription seemed accurate enough to find a good enough 
approximation to the correct signs of these coefficients for the chosen reference resolution 
ring. To maxunixe the number of non-negligible magnitudes |^/ (q) \ we chose a high-resolution 
resolution ring. In order to avoid almost all values \gi{q) \ being very small, and thus subject to 
significant rounding-off errors, we found the best compromise to choose the reference ring to 
be one for which q ~ |^„k«, where (7„„„ is the value of q for the outermost resolution shell. 

From Eq. (fTTI ). we see that the icosahedral harmonic expansion coefficients of the same 
quantum number I, corresponding to a different resolution shell q' are related to the now known 
ones of resolution shell q by the simple quotient 

g,{q')^Bi{q,q')/g,{q), (21) 

Thus, having found the coefficients gi{q) for a paticular shell q, those of the other shells q' 
were determined from this simple quotient, involving the quantities Bi{q,q') directly calculable 
from the average intensity cross correlations between different resolution rings on the measured 



diffraction patterns. Thus the exhaustive search though all 2'^ combination of signs needs to be 
performed only for a single resolution ring q. 

A knowledge of the expansion coefficients for all the resolution shells should enable a recon- 
struction of the 3D diffraction volume via (|5]l. If this intensity distribution is interpolated onto 
an oversampled llT3l 3D Cartesian reciprocal-space grid, {qx,qy,qz), say, an iterative phasing 
algorithm [[Ml may be applied to reconstruct the 3D electron density of the scattering particle. 

4. Numerical Tests 

A central thesis of this paper is that the the scattered intensity from an icosahedral particle may 
be represented by a sum of icosahedral harmonics. We first sought to verify this proposition by 
calculating first the spherical harmonic expansion coefficients of a simple icosahedral particle 
via the expresion 

Ai,„{q) = i'EfM)Ji{qn)Yi,nirj), (22) 

./ 

where fj{q) is the form factor of the jth atom, Tj is its coordinate, and ji is a spherical Bessel 
function of order /. 




Fig. 3. Regular icosahedron 1151 

For our initial tests we simulated the scattering from an artificial molecule of identical atoms 
at the vertices of a regular icosahedron (Fig. [3]l of edge length 2 IfTSJI (which we take to be to 
be in A units, with Cartesian coordinates (also assumed to be in A): 



(0, ±1, iO) 
(± 1, ±0, 0) 
(±4>, 0, ± 1) 

where <i> is the golden ratio (1h-\/5)/2. 

The resulting calculated values of the amplitudes A/,„ (arbitrarily taking fj{q) — l,Vy') for 
all possible values of of I and m are listed in Fig. |4] Note that the amplitudes A/,„ are all real, 
and that, for the values listed, they are non-zero only for /=0 and 6. The values for Z= 1,2, 3,4,5, 
and 7 are all seen to be zero, corresponding to non-existing icosahedral harmonics for these 
values of I. Here too, all coefficients are zero except those for which /=0 or 6, and all non-zero 



3.3851 38 0.000000 5 -3 -0.000000 0.000000 

1 -1 -0.000000 0.000000 5 -2 -0.000000 0.000000 
1 0.000000 0.000000 5 -1 0.000000 0.000000 

1 1 -0.000000 0.000000 5 0.000000 -0.000000 

2 -2 -0.000000 -0.000000 5 1 0.000000 0.000000 
2 -1 0.000000 0.000000 5 2 -0.000000 0.000000 
2 -0.000000 0.000000 5 3 -0.000000 0.000000 
2 1 0.000000 0.000000 5 4 0.000000 -0.000000 

2 2 -0.000000 -0.000000 5 5 -0.000000 -0.000000 

3 -3 0.000000 -0.000000 6 -6 -2.592501 -0.000000 
3 -2 0.000000 0.000000 6 -5 0.000000 0.000000 
3 -1 -0.000000 0.000000 6 ^ -3.1 39675 -0.000000 
3 0.000000 0.000000 6 -3 0.000000 -0.000000 
3 1 -0.000000 0.000000 6 "2 3.845301 0.000000 
3 2 0.000000 0.000000 6 -1 0.000000 0.000000 

3 3 0.000000-0.000000 6 1.678227 0.000000 

4 ^ -0.000000 0.000000 6 1 0.000000 0.000000 
4 -3 0.000000 0.000000 6 2 3.845301 0.000000 
4 -2 0.000000 0.000000 6 3 0.000000 -0.000000 
4 -1 0.000000 0.000000 6 * -3-1 3*675 -0.000000 
4 0.000000 0.000000 6 5 0.000000 0.000000 
4 1 0.000000 0.000000 6 6 -2.592501 -0.000000 
4 2 0.000000 0.000000 ^ ■^ o.oooooo -o.oooooo 

4 3 0.000000 0.000000 ^ -6 -0.000000 0.000000 
44-0.000000 0.000000 7-5 0.00000 0.000000 

5 -5 -0.000000 -0.000000 7-* 0.00000 0.000000 
5 ^ 0.000000 -0.000000 7-3 0.00000 0.000000 

Fig. 4. Calculated values of the A/,„ coefficients (arbitrarily taking fj{ii) = 1, Vy) assuming 
12 identical atoms at the vertices of a regular icosahedron. The first two entries in each 
column in each line are the / and m values. The next two are the real and imaginary parts 
of Ai„,(q). It will be seen that all coefficients are zero except those for which /=0 or 6, and 
that all non-zero coefficients are real. 

coefficients are real. Note that this result will be true for any icosahedral orientation since the 
a rotation matrix (Wigner D-matrix) will mix only amplitudes of different magnetic quantum 
number corresponding to the same angular momentum /. The z-axis of the simple icosahedron 
used for this test is a 2-fold rotation axis, not 5-fold, unlike e.g. Fig.|2]above, or Table 1 below. 
This is why the amplitudes corresponding to every other value of m are non zero for 1=6 rather 
than every integer multiple of 5, when z is chosen to be a 5-fold axis. 

Of greater interest for our method are the allowed values of L for the coefficients, Ilm, of the 
spherical harmonic expansions of the scattered intensity. Since 

Iiq)^\A{q)\\ (23) 

it must follow that if A(q) has icosahedral symmetry, so must /(q). However, this is not entirely 
obvious from the relationship between the two sets of coefficients 

InrJ'm' 

= E A,,MA},Aq) fY;,„im',n'iq)YLMiq)dq 

hirj'm' ■' 



ImJ'm' V ^ ^ 

where C|/™ /^^ is a Clebsch-Gordan coefficient lfT6l . According to the usual theory of the vector 
addition of angular momenta, the allowed values of L are all integers in the range from |/ — /'| to 
I + /', with no obvious indication that L= 1,2, 3,4,5, and 7, for instance, are forbidden. However, 
a straightforward evaluation of the IiMiq) coefficients via JTM reveals this to be the case, as is 
seen by the tabulated values of these coefficients in Fig.|5] 
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6-3 0.000000-0.000000 
6-2 45.0881810.000000 
6-1 0.000000 0.000000 

6 10.678093 0.000000 
61 0.000000 0.000000 
6 2 45.0881810.000000 
6 3 0.000000-0.000000 

6 4-36.814346-0.000000 
6 5 0.000000 0.000000 

6 6-30.398445-0.000000 
7-7 0.000000-0.000000 
7-6-0.000000 0.000000 

7 -5 0.00000 0.000000 
7-4 0.00000 0.000000 
7-3 0.00000 0.000000 



Fig. 5. Same as Fig. [4] except for values of the lLM{q) coefficients calculated by Eq. l l24t 
from the Ai„,{q) values in Fig.|4] 




Fig. 6. Top part of the structure of the satellite tobacco necrosis virus (STNV) viewed down 
its 5-fold axis (from structure data in PDB entry: 2BUK) 



We next tested this on a realistic model of the small icosahedral virus, satellite tobacco necro- 
sis virus (STNV) whose atomic coordinates are deposited in the protein data bank under entry 
2BUK (Fig.|6l). We calculated A(q) from the usual structure factor expression 

A{q)^Y.fM)^^v{iq-^rj) (25) 

i 

and constructed the diffraction volume from (|23] |. By integrating over spherical shells of I{q) 
we evaluated the spherical harmonic expansion coefficients of the 3D diffraction volume of 
STNV from 

hM) = [l{q)YUq)dq, (26) 



where q is the unit vector q/q, with this integration conveniently performed by Gaussian qudra- 
ture ifTTll . Plots of the real and imaginary parts of //„, in Fig. [Tjclearly show the same trend of 
vanishing components corresponding to / = 1,2,3,4,5, and? and in addition vanishing compo- 
nents for / = 8,9, 11, 13, 14, and 15, exactly consitent with the tablulated values of icosahedral 
expansion coefficients in Table 1 . What is more, it was found that 

{-iri,i-„,){q)^Il,n{q), (27) 

the precise condition for the reality of the Ri,„ (q) coefficients of the RSHs, and hence of the 
icosahedral harmonic expansion coefficients gi{q) via ( fTSb . 

Since this result is a consequence of the icosahedral symmetry of the diffraction volume /(q), 
it is to be expected of the diffraction volume of all icosahedral viruses (assuming the protein 
coat to be the dominant scatterer). In view of ( fT3] l this must mean that the Bi{q^q') coeffi- 
cients computed from the data of diffraction patterns of random orientations of all icosahedral 
particles must all have vanishing values for / = 1,2, 3, 4, 5, 7, 8, 9, 11, 13, 14, 15,.., thus provid- 
ing a very simple test of whether the diffraction patterns measured in a "diffract and destroy" 
experiment with a PEL are from an icosahedral particle. 

Assuming this is indeed found to be approximately true in practice (even the so-called icosa- 
hedral viruses may have appendages which break the icosahedral symmetry of the protein coat, 
and of course the genetic material inside the protein coat would not be expected to have this 
symmetry. However, if the bulk of the material of the virus may be assumed to constitute the 
protein coat, this must be approximately the case). The icosahedral structure of the protein coat 
may be found by an analysis of the large / = 0,6, 10, 12, 16,... Bi{q,q') coefficients extactable 
from the average angular correlations of the diffraction data. 

5. Reconstruction of STNV from Simulated Diffraction Patterns 

We next attemped a reconstruction of satellite tobacco necrosis virus (STNV) from diffraction 
patterns simulated for directions of incidence on a single particle from a uniform angular dis- 
tribution in SO(3) ifTSl . For the model of STNV we took the data of the bological assembly of 
STNV from PDB entry 2BUK. Due to the large number of atoms in this biological assembly 
(~ 100,000), the most convenient way to do this was to take slices through a precalculated 3D 
diffraction volume of this structure. Average angular correlations of these simulated diffrac- 
tion patterns were calculated by the formulae (|8]l and (|9]l. and the Bi{q,q') coefficients were 
calculated from these by inverting Eq. dTol i. 

For the 10,000 simulated diffraction patterns in our test, this process took about a quarter 
of an hour on a single processor on a desktop computer. In a real experiment, one may have 
to deal with perhaps 100 times as many diffraction patterns, with more pixels per pattern, so 
the processing time could be several orders of magnitude greater. However, the bulk of the 
time will be spent in generating the average angular correlations C2((?,g'',A0) dH), a process 
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Fig. 7. Real and imaginary parts of the Ii„,{q) coefficients calculated from the computed 
diffraction volume of STNV. Each dot represents a value of the (Im) pair. Note that these 
coefficients are largely absent for /=1,2,3,4,5,7,8,9,11, 13,14,15. 



which easily lends itself to parallelization, since subsets of the diffraction patterns may be 
averaged by separate computer processors, and the averages themselves subsequently averaged. 
Nevertheless, this process of reduction of terabytes (TB) of measured experimental data is 
probably the most computer-resource intensive part of our method. Having thus reduced our 
data to a set of Bi{q,q') coefficients for a set of 30 values of /, and 61 values of q (and q'), we 
were left with a set of 30x61 x 61 real numbers which formed the input to our reconstruction 
algorithm. This required about a MB of storage/memory. In a real experiment also, our method 
requires the million-fold reduction of the TB of data to a MB of floating-point (real) numbers 
that form the input to our reconstruction algorithm. It is recommended that this data reduction 
be performed at the site of the data to reduce by a million-fold or so the quantity of data that 
needs to be transmitted over the internet to the site where the image reconstruction is performed. 
At current rates, the transmission of Terabytes of data over the internet could take several weeks, 
whereas the time for the transmission of a MB of data could be measured in seconds. In addition 
this process of data reduction is expected to result in considerable noise-reduction of the raw 
data though averaging ||T9l . 

Since the Bi{q,q') coefficients are related to the expansion coefficients Ri,„{q) of the 
real spherical harmonics (which satisfy the same selection rule on / as do the expansion 
coefficients Iim{q) of the regular spherical harmonics), it would be expected that the I = 
0,6, 10, 12, 16, 18,20,22,24,26,28,30 elements of these coefficients are dominant. This was 
found to be the case for our simulations of STNV. Some of the larger, predominantly icosa- 



hedral, viruses may have appendages like the unique vertex and "hair" of the mimivirus 
, or the spike fom a unique vertex of the chlorella virus II2TI . Indeed, with values of these 
coefficients extracted from experimental single-particle diffraction patterns from an unknown 
particle, the satisfaction of this selection rule would be an excellent test of the degree to which 
the particle is icosahedral. Inclusion of only the large Z = 0,6, 10, 12, 16, ... of the Bi{q,q') co- 
efficients in the reconstruction algorithm consistent with icosahedral symmetery is equivalent 
to finding the closest icosahedral approximation to the structure. 




Fig. 8. Reconstructed image from the diffraction volume of a single STNV particle com- 
puted directly from a structure factor calculation. STNV is about 20 nm in diameter. The 
figure depicts a view of the icosahedron close down its 5-fold rotation axis. The recon- 
struction assumed a maximum value of q, q„uu, of about 4.7 nm^' , implying a resolution 
of ~ 1.3 nm. Both the outer and inner surfaces of the virus capsid are apparent in this rep- 
resentation. A ribbon diagram of the structure in PDB entry 2BUK is seen to fit within this 
capsid. 

The procedure described in section 3 was then followed to reconstruct a 3D diffraction vol- 
ume, consisting of set of scattered intensities I{qx,qy,<Jz) over 3D reciprocal space as a func- 
tion of the reciprocal-space coordinate q = {qx,qy,qz)- In our simulations, we took this to be a 
61x61 x61 array of real numbers. The computer time for this process was almost ridiculously 
short, amounting to no more than a few seconds on a single-processor desktop computer 

The final step is the recovery of a 3D electron density of the particle. This may be done by a 
standard iterative phasing algorithm. We used the "charge flipping" algorithm of Oszlanyi and 
Siito II22II23I . In order to judge the accuracy of the our algorithm in recovering the 3D diffrac- 
tion volume, we performed this recovery of the 3D electron density from both the diffraction 
volume /(q) calculated directly from the STNV structure factors (Fig. [8]i, and also by our al- 
gorithm from the Bi{q,q') coefficients (Fig. |9]l, which may be computed from the measured 
data of the FEL diffraction patterns from random particle orientations. The similarity of the 
reconstructed images of Figs. |8]and|9]was a further indication of the validity of the method of 
image reconstruction from the quantities Bi{q,q') derivable from the average angular coiTela- 
tions. The fact that the reconstructed image consists of a thin protein shell is also seen from 
the slice perpendicular to a 5-fold rotation axis through the reconstructed image of Fig. |9] de- 
picted in Fig. [To] In the case of all three figures, a ribbon representation of the structure from 




Fig. 9. Same as Fig.[8]except that the diffraction volume was reconstucted from the average 
of angular correlations on 10,000 diffraction patterns of STNV from uniformly distributed 
directions over S0(3). The reconstructed electron density is seen to be remarkably similar 
to that in Fig. [8] 




Fig. 10. Same as Fig.|9]except that image displayed is a cut perpendicular to the 5-fold axis 
of the virus. The 5-fold symmetry of both the external and internal surfaces of the capsid 
in this projection are clearly visible. 



the biological assembly of the STNV virus from the same PDB structure used to simulate the 
diffraction patterns is superimposed on the semi-transparent electron density to show the excel- 
lence of the reconstruction. It should be emphasized that nowhere in our theory is it assumed 
that the structure consists of a thin protein shell, unlike the so-called shell model that has been 
used in the S AXS analysis of virus capsids ||24| . In our case, the existence of a shell is deduced 
by an iterative phasing algorithm from the anaysis of data from diffraction patterns of random 
particle orientations without any assumptions on our part. 

6. Beyond the Icosahedral Approximation 

Satellite tobacco necrosis virus (STNV) is an example of a virus with a perfectly icosahedral 
protein coat ll25l . A host cell gets access to the genetic material of this virus by ingesting it 
whole and dissolving its protein coat. 

Many of the larger viruses are only approximately icosahedral: they often have appendages, 
such as a neck sticking out of the coat that is used to inject the genetic material inside the coat 
into a host cell whose protein making capability is hijacked by the virus DNA or RNA. 

An ultimate reconstruction algorithm should be able to reconstruct these non-icosahedral 
parts of the structure in addition to the icosahedral part. The above procedure has determined the 
icosahedral harmonic expansion coefficients gi [q] that best fit the measured quantities Bi {q, q'). 
Any deviations from these values are due to the non-icosahedral parts of the structure. Any 
differences between the experimental values of Bi{q,q') and gi{q)gi{q') may be written 

5Bi{q,q') = J^ai,n{gi{q)5Ri,n{q') + 5Ri,n{q)gi{q)} + 5Ri,n{q)SRi,n{q'), (28) 

m 

in terms of 5Ri,„{q), the extra contribution to the RSH expansion coefficients due to deviations 
from icosahedral symmetry. Note that for (/,m) combinations not associated with icosahedral 
harmonics, e.g. those for which there is no entry in a list like Table 1, the terms a;,„ will be 
zero, and only the quadratic terms in 8Ri,„ will survive in (|28] |. Determination of the 8Ri,„{q) 
coefficients which optimize the agreement the theoretical expression (|28] l and the measured 
values will enable the construction of a better estimate of a single-particle diffraction volume 
via 

I{q) = Y.{si{q)ai,n + 5Ri,„{q)]SU^). (29) 

Im 

The presence of the correction terms 8Ri„,{q), which have no symmetry restrictions (apart 
from Friedel symmetry) will allow the diffraction volume calculated by this formula to include 
deviations from icosahedral symmetry. 

Application of an interative phasing algorithm to an oversampled diffraction volume calcu- 
lated by this expression will enable the determination of the full structure of the virus, including 
any appendages that break the approximate icosahedral symmetry. 

7. Discussion 

The remarkable similarity of the reconstructed electron densities of Figs. [8] and |9] and the fit 
of the latter to the model of STNV from the PDB file, are indications of the correctness of the 
method of reconstruction of the 3D diffraction volume from the average angular correlations 
of the 10,000 simulated diffraction patterns of STNV. We calculated from these the Bi{q,q') 
coefficients for all values of / from to 30. We found good agreement with the selection rule 
on the I coefficients in which the sizes of the B^^i coefficients for all odd values of I were 
small (due to Friedel, or inversion, symmetry) and in addition the even values /=2,4,8, and 
14 were also small, due to the icosahedral symmetry of the 3D diffraction volume of a single 
particle. We included gi{q) coefficients for the non-negilible Bi{q,q') coefficients up to Z=30 



(up to which value the icosahedral harmonic expansion coefficients depend on the / quantum 
number only). If q,„ax is the maximum value of the reciprocal-space coordinate q up to which 
the reconstruction is valid, conventional wisdom ll26ll suggests that /,„„., and qmax should be 
related by 

where R is the radius of the particle. Taking 

q,„ax = 27Z/d (31) 

where d is the resolution. Substituting ( l3Tl i into (l30l l and rearranging, we find that 

d/R = 27t/l,„ax^l/5. (32) 

STNV has a radius of ~ 100 A suggesting a resolution of about 20 A. In practice we found 
that increasing q^ax a further 50% or so, while keeping /„,„, fixed at 30 seemed to improve 
the quality of the reconstructed image. Presumably because up to about l.Sqmax the spherical 
harmonic expansion coefficients of / greater than 30 remain small. 

It should be emphasized this is not necessarily an absolute limit of the resolution obtainable 
with the use of icosahedral harmonics. The higher order harmonics, at least up to /=44, have 
been tabulated by Zheng et al. |!24l|. At least up to this value, the degeneracy of the icosahedral 
harmonics characterized by a particular value of I is no more than two. Although the algorithm 
for recovering the expansion coefficients of such degenerate hamonics from the experimental 
data is a little more complicated, it seems far from an insuperable problem. 

The images in Figs.|8]to|9]were computed by an iterative phasing algorithm Il22ll23l from a 
reciprocal-space distribution of intensities oversampled lfT3l by a factor of ^ 2 with respect to 
the size of STNV, up to a q,nax value of'-^0.47A^' (a61x61x61 array), implying a resolution 
of about 13 A, and a d/R ratio closer to 1/8. Further, the images of Fig. ISlfTOl reveal this coat 
to be hollow. The slice (Fig. [Tol l through the reconstructed image perpendicular to the 5-fold 
axis reveals both external and internal surfaces of 5 -fold rotational symmetry. The revelation of 
the hollow nature of the protein coat is of course an extra feature contained in the 3D intensity 
distribution above and beyond the assumed icosahedral symmetry. It is revealed by the iterative 
reconstruction algorithm used Il22ll23l due to the paricular variation of the Bi{q,q') coefficients 
with the radial reciprocal-space coordinates q and q'. 

Some the advantages of this method of analysis of single particle diffraction patterns from un- 
known particle orientations compared with other proposed algorithms U |5] should be pointed 
out. Since it has been shown ||7]|27][T2][l9]|28l, that the angular correlations of multiple iden- 
tical particles in arbitrary orientations are essentially identical to those from a single particle, 
the method we have described is equally applicable to droplets containing multiple particles 
injected into the XFEL [29] as to the injection of single particles in random orientations. Thus 
there is no need to discard diffraction patterns from multiple particle hits. 

Since the inputs to our algorithm are not the direct photon counts, but rather the average of the 
angular correlations between intensities of the same diffraction patterns, it is insensitive to shot- 
to-shot fluctuations between the diffraction patterns, as may be caused by intensity variations 
of the incident X-ray beam or, for example, by the number of particles scattering a particular 
X-ray pulse. 

The raw experimental data is likely to consist of ~^ 10^ diffraction patterns, each of ~^ 10^ 
pixels. Thus the raw experimental data will require TB of storage. Of course, this is very noisy 
data, and the structural information content is much less than this. The averaging of the angular 
correlations that we perform may be regarded as a form of data averaging that results in info- 
mormation concentration and noise reduction. Even if the number of values of q chosen is, say. 



61, and these coefficeints are evaluated for, say, 30 values of /, the total number of these (real) 
coefficients will be only of the order of 100,000, requiring less than a MB of storage. This data 
reduction is best performed at the site of the data to allow the tranference of a million times less 
data over the internet to the site of image reconstruction. The reconstruction of 3D images of 
the quality of Figs. 8-10 from a properly constructed set of Bi{q,q') coefficients is extraordinar- 
ily rapid. In our calculations, reconstruction of an array of I{qx,qy,qz) values representing a 3D 
diffraction volume at reciprocal-space coordinates q= {qxi^y^qz) on a 61x61x61 Cartesian 
grid took just a few seconds on a single Intel Q6600 processor, using an Intel Fortran compiler. 
The reconstruction of real-space images of the quality of Figs. 8-10 from this array by means 
of a "charge flipping" algorithm |22, 231] took a further 4 minutes for 200 iterations on a laptop 
PC. 

Of course, the averaging of the data from the different diffraction pattterns assumes they all 
arise from copies of the particle in different orientations (as does the technique of small angle 
X-ray scattering, SAXS, for example). In order to distinguish between different conformations 
of the individual molecules, it may be necessary to operate on the entire ensemble of all the 
measured diffraction patterns One of the disadvantages of such methods is the need to operate 
on single-particle diffraction patterns and thus, unlike with our method, diffraction patterns 
from multiple hits need to be removed. Such methods also face the problem of the uncertain 
normalization of incident intensities between successive pulses of incident radiation. Also, in 
contrast to our method, such techniques may require the tranferance of perhaps TB of data over 
the internet to the site of the data performing the analysis, which needs to be equipped with a 
cluster of computers performing parallel computations. 

8. Conclusions 

When reconstructing the structure of a virus from "diffract and destroy" type single-particle 
diffraction experiments proposed for the free electron laser H], one may exploit the dictum 
of Caspar and Klug |9| that "there are only a limited number of efficient designs possible 
for a biological container which can be constructed from a large number of identical protein 
molecules. The two basic designs are helical tubes and icosahedral shells". We offer here a 
solution for the case of icosahedral viruses. For those viruses which are substantially, though not 
completely icosahedral, the method proposed is expected to be useful nontheless for initially 
reconstructing the approximate icosahedral structure. The deviations from this structure can 
then be found by a perturbation theory which does not impose this symmetry. 

The input to the algorithm is data from diffraction patterns of randomly oriented identical 
particles (where the particle orientations are unknown) in the form of the average of the angu- 
lar correlations. As a consequence of any approximate icosahedral symmetry of the scattering 
particles, the angular momentum decomposition of the angular correlations contains only a few 
dominant contibutions from low values of the angular momenta. 

This immediately suggests a simple test of whether the experimentally measured data are 
from the scattering by an icosahedral object. If so, the components of the quantities Bi{q,q'), 
derivable from the angular correlations, should have much smaller values for Z= 1,2, 3,4, and 5 
than for /=0 and 1=6, for example. What is more, as we have shown in this paper, the coefficients 
gi{q) of the icosahedral harmonic expansion of the 3D diffraction volume of the particle may be 
derived from the Bi{q,q') data and a positivity condition on the intensities of the 3D diffraction 
volume. This will be the case even if the individual diffraction patterns are a result of scattering 
from more than one particle, so there will be no need to discard the diffraction patterns from 
multiple particles. 

Having obtained the coefficients of an icosahedral harmonic expansion, the 3D diffraction 
volume may be reconstructed as a sum over these icosahedral harmonics. By definition, the 



resulting diffraction volume will have icosahedral symmetry. If this is constructed at a grid that 
is oversampled by a factor of 2 in each dimension, we have shown that a "charge flipping" algo- 
rithm with no fixed support contraint is able to reconstruct a 3D image of the particle. We find 
that this procedure not only reconstructs an icosahedral shape for the particle, in simulations 
for the satellite tobacco necrosis virus (STN V) it even reveals the hollow nature of the protein 
coat. 

We ackowledge helpful discussions with Profs. Abbas Ourmazd and John Spence, and finan- 
cial support from DOE grant No. Ide-sc0002 14 1] 



Table 1. Expansion coefficients of thie lowest even degree icosahedral harmonics with z- 
axis chosen to be the 5-fold rotation axis. For the list up to /=30, see e.g. 1101 . Note the 
rows are characterized by / and the columns by m. 
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